Learning Latent Variable Models by Pairwise Cluster Comparison: Part I - Theory and Overview

نویسندگان

  • Nuaman Asbeh
  • Boaz Lerner
چکیده

Identification of latent variables that govern a problem and the relationships among them, given measurements in the observed world, are important for causal discovery. This identification can be accomplished by analyzing the constraints imposed by the latents in the measurements. We introduce the concept of pairwise cluster comparison (PCC) to identify causal relationships from clusters of data points and provide a two-stage algorithm called learning PCC (LPCC) that learns a latent variable model (LVM) using PCC. First, LPCC learns exogenous latents and latent colliders, as well as their observed descendants, by using pairwise comparisons between data clusters in the measurement space that may explain latent causes. Since in this first stage LPCC cannot distinguish endogenous latent non-colliders from their exogenous ancestors, a second stage is needed to extract the former, with their observed children, from the latter. If the true graph has no serial connections, LPCC returns the true graph, and if the true graph has a serial connection, LPCC returns a pattern of the true graph. LPCC’s most important advantage is that it is not limited to linear or latent-tree models and makes only mild assumptions about the distribution. The paper is divided in two parts: Part I (this paper) provides the necessary preliminaries, theoretical foundation to PCC, and an overview of LPCC; Part II formally introduces the LPCC algorithm and experimentally evaluates its merit in different synthetic and real domains. The code for the LPCC algorithm and data sets used in the experiments reported in Part II are available online.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Latent Variable Models by Pairwise Cluster Comparison: Part II - Algorithm and Evaluation

It is important for causal discovery to identify any latent variables that govern a problem and the relationships among them, given measurements in the observed world. In Part I of this paper, we were interested in learning a discrete latent variable model (LVM) and introduced the concept of pairwise cluster comparison (PCC) to identify causal relationships from clusters of data points and an o...

متن کامل

Pairwise Cluster Comparison for Learning Latent Variable Models

Learning a latent variable model (LVM) exploits values of the measured variables as manifested in the data to causal discovery. Because the challenge in learning an LVM is similar to that faced in unsupervised learning, where the number of clusters and the classes that are represented by these clusters are unknown, we link causal discovery and clustering. We propose the concept of pairwise clus...

متن کامل

Learning Latent Variable Models by Pairwise Cluster Comparison

Identification of latent variables that govern a problem and the relationships among them given measurements in the observed world are important for causal discovery. This identification can be made by analyzing constraints imposed by the latents in the measurements. We introduce the concept of pairwise cluster comparison PCC to identify causal relationships from clusters and a two-stage algori...

متن کامل

An Overview of the New Feature Selection Methods in Finite Mixture of Regression Models

Variable (feature) selection has attracted much attention in contemporary statistical learning and recent scientific research. This is mainly due to the rapid advancement in modern technology that allows scientists to collect data of unprecedented size and complexity. One type of statistical problem in such applications is concerned with modeling an output variable as a function of a sma...

متن کامل

The Comparison of Two Models for Evaluation of Pre-internship Comprehensive Test: Classical and Latent Trait

Introduction: Despite the widespread use of pre-internship comprehensive test and its importance in medical students’ assessment, there is a paucity of the studies that can provide a systematic psychometric analysis of the items of this test. Thus, the present study sought to assess March 2011 pre-internship test using classical and latent trait models and compare their results. Methods: In th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 17  شماره 

صفحات  -

تاریخ انتشار 2016